83 research outputs found

    One Parser to Rule Them All

    Get PDF
    Despite the long history of research in parsing, constructing parsers for real programming languages remains a difficult and painful task. In the last decades, different parser generators emerged to allow the construction of parsers from a BNF-like specification. However, still today, many parsers are handwritten, or are only partly generated, and include various hacks to deal with different peculiarities in programming languages. The main problem is that current declarative syntax definition techniques are based on pure context-free grammars, while many constructs found in programming languages require context information. In this paper we propose a parsing framework that embraces context information in its core. Our framework is based on data-dependent grammars, which extend context-free grammars with arbitrary computation, variable binding and constraints. We present an implementation of our framework on top of the Generalized LL (GLL) parsing algorithm, and show how common idioms in syntax of programming languages such as (1) lexical disambiguation filters, (2) operator precedence, (3) indentation-sensitive rules, and (4) conditional preprocessor directives can be mapped to data-dependent grammars. We demonstrate the initial experience with our framework, by parsing more than 20000 Java, C#, Haskell, and OCaml source files

    A tool box for compiler construction

    Full text link

    Operator precedence for data-dependent grammars

    Get PDF
    Constructing parsers based on declarative specification of operator precedence is a very old research topic, and there are various existing approaches. However, these approaches are either tied to a particular parsing technique, or cannot deal with all corner cases found in programming languages. In this paper we present an implementation of declarative specification of operator precedence for general parsing that (1) is independent of the underlying parsing algorithm, (2) does not require any grammar transformation that increases the size of the grammar, (3) preserves the shape of parse trees of the original, natural grammar, and (4) can deal with intricate cases of operator precedence found in functional programming languages such as OCaml. Our new approach to operator precedence is formulated using data-dependent grammars, which extend context-free grammars with arbitrary computation, variable binding and constraints. We implemented our approach using Iguana, a data-dependent parsing framework, and evaluated it by parsing Java and OCaml source files. The results show that our approach is practical for parsing programming languages with complicated operator precedence rules

    A Federated Design for a Neurobiological Simulation Engine: The CBI Federated Software Architecture

    Get PDF
    Simulator interoperability and extensibility has become a growing requirement in computational biology. To address this, we have developed a federated software architecture. It is federated by its union of independent disparate systems under a single cohesive view, provides interoperability through its capability to communicate, execute programs, or transfer data among different independent applications, and supports extensibility by enabling simulator expansion or enhancement without the need for major changes to system infrastructure. Historically, simulator interoperability has relied on development of declarative markup languages such as the neuron modeling language NeuroML, while simulator extension typically occurred through modification of existing functionality. The software architecture we describe here allows for both these approaches. However, it is designed to support alternative paradigms of interoperability and extensibility through the provision of logical relationships and defined application programming interfaces. They allow any appropriately configured component or software application to be incorporated into a simulator. The architecture defines independent functional modules that run stand-alone. They are arranged in logical layers that naturally correspond to the occurrence of high-level data (biological concepts) versus low-level data (numerical values) and distinguish data from control functions. The modular nature of the architecture and its independence from a given technology facilitates communication about similar concepts and functions for both users and developers. It provides several advantages for multiple independent contributions to software development. Importantly, these include: (1) Reduction in complexity of individual simulator components when compared to the complexity of a complete simulator, (2) Documentation of individual components in terms of their inputs and outputs, (3) Easy removal or replacement of unnecessary or obsoleted components, (4) Stand-alone testing of components, and (5) Clear delineation of the development scope of new components
    • …
    corecore